Objectives:


library(dplyr)
library(sf)
library(ggplot2)
library(osmdata)
library(leaflet)

Part 1. Import vector data from Open Street Map

What is Open Street Map?

Open Street Map (OSM) is a collaborative project which aims at mapping the world and sharing geospatial data in an open way. Anyone can contribute, by mapping geographical objects their encounter, by adding topical information on existing map objects (their name, function, capacity, etc.), or by mapping buildings and roads from satellite imagery (cf. HOT: Humanitarian OpenStreetMap Team).

This information is then validated by other users and eventually added to the common “map” or information system. This ensures that the information is accessible, open, verified, accurate and up-to-date.

The result looks like this: View of OSM web interface The geospatial data underlying this interface is made of geometrical objects (i.e. points, lines, polygons) and their associated tags (#building #height, #road #secondary #90kph, etc.).

How to extract geospatial data from Open Street Map?

Bounding-box

The first thing to do is to define the area within which you want to retrieve data, aka the bounding box. This can be defined easily using a place name and the function getbb() from the package osmdata.

“This function uses the free Nominatim API provided by OpenStreetMap to find the bounding box (bb) associated with place names.”

We are going to look at Brielle together, but you can also work with the small cities of Naarden, Geertruidenberg, Gorinchem, Enkhuizen or Dokkum.

/! This might not be needed, but ensures that your machine knows it has internet…

assign("has_internet_via_proxy", TRUE, environment(curl::has_internet))
bb <- getbb('Brielle', format_out = 'sf_polygon')
bb
Simple feature collection with 2 features and 0 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -74.08588 ymin: 40.08917 xmax: 4.229222 ymax: 51.93141
Geodetic CRS:  WGS 84
                        geometry
1 POLYGON ((4.135294 51.90795...
2 POLYGON ((-74.08588 40.0979...
  • Why multiple polygons?

Because there are different responses from the API query, corresponding to different objects at the same location, or different objects are different locations. Here, we have two options: Brielle (Netherlands) and Brielle (New Jersey)

Brielle, Netherlands Brielle, New Jersey

You can therefore try to be more precise by adding a country code or district name.

bb <- getbb('Brielle, NL', format_out = 'data.frame')
bb
   place_id
1 308228079
                                                                 licence
1 Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright
  osm_type  osm_id                                  boundingbox       lat
1 relation 2711001 51.8835002, 51.9314095, 4.1352938, 4.2292222 51.901662
        lon                                     display_name    class
1 4.1626292 Brielle, Voorne aan Zee, Zuid-Holland, Nederland boundary
            type importance
1 administrative  0.5957129
                                                                                  icon
1 https://nominatim.openstreetmap.org/ui/mapicons/poi_boundary_administrative.p.20.png

Extracting features

A feature in the OSM language is a category or tag of a geospatial object. Features are described by general keys (e.g. “building”, “boundary”, “landuse”, “highway”), themselves decomposed into sub-categories (values) such as “farm”, “hotel” or “house” for buildings, “motorway”, “secondary” and “residential” for highway. This determines how they are represented on the map.

x <- opq(bbox = bb) %>%
   add_osm_feature(key = 'building')%>%
    osmdata_sf()

What is this x object made of? It is a table of all the buildings contained in the bounding box, which gives us their OSM id, their geometry and a range of attributes, such as their name, building material, building date, etc. The completion level of this table depends on user contributions and open resources (here for instance: BAG, different in other countries).

head(x$osm_polygons)
Simple feature collection with 6 features and 46 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 4.135967 ymin: 51.904 xmax: 4.150993 ymax: 51.91276
Geodetic CRS:  WGS 84
           osm_id name addr:city addr:housename addr:housenumber addr:postcode
51054695 51054695 <NA>      <NA>           <NA>             <NA>          <NA>
51054698 51054698 <NA>      <NA>           <NA>             <NA>          <NA>
51054719 51054719 <NA>      <NA>           <NA>             <NA>          <NA>
51054751 51054751 <NA>      <NA>           <NA>             <NA>          <NA>
51054849 51054849 <NA>      <NA>           <NA>             <NA>          <NA>
51054858 51054858 <NA>      <NA>           <NA>             <NA>          <NA>
         addr:street alt_name amenity architect bridge:support building
51054695        <NA>     <NA>    <NA>      <NA>           <NA>      yes
51054698        <NA>     <NA>    <NA>      <NA>           <NA>      yes
51054719        <NA>     <NA>    <NA>      <NA>           <NA>      yes
51054751        <NA>     <NA>    <NA>      <NA>           <NA>      yes
51054849        <NA>     <NA>    <NA>      <NA>           <NA>      yes
51054858        <NA>     <NA>    <NA>      <NA>           <NA>      yes
         building:levels building:material building:name construction
51054695            <NA>              <NA>          <NA>         <NA>
51054698            <NA>              <NA>          <NA>         <NA>
51054719            <NA>              <NA>          <NA>         <NA>
51054751            <NA>              <NA>          <NA>         <NA>
51054849            <NA>              <NA>          <NA>         <NA>
51054858            <NA>              <NA>          <NA>         <NA>
         dance:teaching denomination description emergency fixme golf heritage
51054695           <NA>         <NA>        <NA>      <NA>  <NA> <NA>     <NA>
51054698           <NA>         <NA>        <NA>      <NA>  <NA> <NA>     <NA>
51054719           <NA>         <NA>        <NA>      <NA>  <NA> <NA>     <NA>
51054751           <NA>         <NA>        <NA>      <NA>  <NA> <NA>     <NA>
51054849           <NA>         <NA>        <NA>      <NA>  <NA> <NA>     <NA>
51054858           <NA>         <NA>        <NA>      <NA>  <NA> <NA>     <NA>
         heritage:operator historic image layer leisure level location
51054695              <NA>     <NA>  <NA>  <NA>    <NA>  <NA>     <NA>
51054698              <NA>     <NA>  <NA>  <NA>    <NA>  <NA>     <NA>
51054719              <NA>     <NA>  <NA>  <NA>    <NA>  <NA>     <NA>
51054751              <NA>     <NA>  <NA>  <NA>    <NA>  <NA>     <NA>
51054849              <NA>     <NA>  <NA>  <NA>    <NA>  <NA>     <NA>
51054858              <NA>     <NA>  <NA>  <NA>    <NA>  <NA>     <NA>
             man_made natural old_name operator ref:bag ref:bag:old ref:rce
51054695 storage_tank    <NA>     <NA>     <NA>    <NA>        <NA>    <NA>
51054698 storage_tank    <NA>     <NA>     <NA>    <NA>        <NA>    <NA>
51054719 storage_tank    <NA>     <NA>     <NA>    <NA>        <NA>    <NA>
51054751 storage_tank    <NA>     <NA>     <NA>    <NA>        <NA>    <NA>
51054849 storage_tank    <NA>     <NA>     <NA>    <NA>        <NA>    <NA>
51054858 storage_tank    <NA>     <NA>     <NA>    <NA>        <NA>    <NA>
         religion roof:levels roof:shape   source source:date start_date
51054695     <NA>        <NA>       <NA> 3dShapes        <NA>       <NA>
51054698     <NA>        <NA>       <NA> 3dShapes        <NA>       <NA>
51054719     <NA>        <NA>       <NA> 3dShapes        <NA>       <NA>
51054751     <NA>        <NA>       <NA> 3dShapes        <NA>       <NA>
51054849     <NA>        <NA>       <NA> 3dShapes        <NA>       <NA>
51054858     <NA>        <NA>       <NA> 3dShapes        <NA>       <NA>
         website wikidata wikipedia                       geometry
51054695    <NA>     <NA>      <NA> POLYGON ((4.136178 51.90762...
51054698    <NA>     <NA>      <NA> POLYGON ((4.136317 51.90752...
51054719    <NA>     <NA>      <NA> POLYGON ((4.14001 51.90721,...
51054751    <NA>     <NA>      <NA> POLYGON ((4.142694 51.90404...
51054849    <NA>     <NA>      <NA> POLYGON ((4.147362 51.91272...
51054858    <NA>     <NA>      <NA> POLYGON ((4.150993 51.91204...

Mapping attributes

For instance: the building age focusing on post-1900 buildings.

Projections

First, we are going to select the polygons and reproject them with the Amersfoort/RD New projection, suited for maps centred on the Netherlands. This code for this projection is: 28992.

buildings <- x$osm_polygons %>%
  st_transform(.,crs=28992)

Mapping

Then we create a variable which a threshold at 1900. Every date prior to 1900 will be recoded 1900, so that buildings older than 1900 will be represented with the same shade.

Then we use the ggplot function to visualise the buildings by age. The specific function to represent information as a map is geom_sf(). The rest works like other graphs and visualisation, with aes() for the aesthetics.

buildings$build_date <- as.numeric(
  if_else(
    as.numeric(buildings$start_date) < 1900, 
          1900, 
          as.numeric(buildings$start_date)
          )
  )

 ggplot(data = buildings) +
   geom_sf(aes(fill = build_date, colour=build_date))  +
   scale_fill_viridis_c(option = "viridis")+
   scale_colour_viridis_c(option = "viridis")

So this reveals the historical centre of [city] and the various extensions. Anything odd? what? around the centre? Why these limits / isolated points?

Challenge: import an interactive basemap layer under the buildings with Leaflet (15 min)

Some information here: leaflet package documentation

Solution

Part 2. Basic GIS operations with the sf package

Why sf?

sf is a package which supports simple features (sf), “a standardized way to encode spatial vector data.”. It contains a large set of functions to achieve all the operations on vector spatial data for which you might use traditional GIS software: change the coordinate system, join layers, intersect or unit polygons, create buffers and centroids, etc. cf. the sf cheatsheet.

Conservation in Brielle, NL

Let’s focus on old buildings and imagine we’re in charge of their conservation. We want to know how much of the city would be affected by a non-construction zone of 100m around pre-1800 buildings.

Let’s select them and see where they are.

old <- 1800 # year prior to which you consider a building old

summary(buildings$start_date)
   Length     Class      Mode 
     8530 character character 
buildings$start_date <- as.numeric(as.character(buildings$start_date))

old_buildings <- buildings %>%
  filter(start_date <= old)

 ggplot(data = old_buildings) + geom_sf(colour="red")

As conservationists, we want to create a zone around historical buildings where building regulation will have special restrictions to preserve historical buildings.

Buffer

Let’s say this zone should be 100 meters. In GIS terms, we want to create a buffer around polygons. The corresponding function sf is st_buffer, with 2 arguments: the polygons around which to create buffers, and the radius of the buffer.

 distance <- 100 # in meters 
 
buffer_old_buildings <- 
  st_buffer(x = old_buildings, dist = distance)
 
 ggplot(data = buffer_old_buildings) + geom_sf()

Union

Now, we have a lot of overlapping buffers. We would rather create a unique conservation zone rather than overlapping ones in that case. So we have to fuse the overlapping buffers into one polygon. This operation is called union and the corresponding function is st_union.

 single_old_buffer <- st_union(buffer_old_buildings) %>%
   st_cast(to = "POLYGON") %>%
  st_as_sf() 

single_old_buffer<- single_old_buffer %>%
  mutate("ID"=as.factor(1:nrow(single_old_buffer))) %>%
  st_transform(.,crs=28992) 

We also use st_cast() to explicit the type of the resulting object (POLYGON instead of the default MULTIPOLYGON), st_as_sf() to transform the polygon into an sf object.

Centroids

For the sake of visualisation speed, we would like to represent buildings by a single point (for instance: their geometric centre) rather than their actual footprint. This operation means defining their centroid and the corresponding function is st_centroid.

sf::sf_use_s2(FALSE)
centroids_old <- st_centroid(old_buildings) %>%
  st_transform(.,crs=28992)  

ggplot() + 
    geom_sf(data = single_old_buffer, aes(fill=ID)) +
    geom_sf(data = centroids_old)

Intersection

Now what we would like to distinguish conservation areas based on the number of historic buildings they contain. In GIS terms, we would like to know how many centroids each fused buffer polygon contains. This operation means intersecting the layer of polygons with the layer of points and the corresponding function is st_intersection.

 centroids_buffers <- 
  st_intersection(centroids_old,single_old_buffer) %>%
   mutate(n=1)

 centroid_by_buffer <- centroids_buffers %>%
   group_by(ID) %>%
   summarise(n = sum(n))
 single_buffer <- single_old_buffer %>%
   mutate(n_buildings = centroid_by_buffer$n)

  ggplot() + 
   geom_sf(data = single_buffer, aes(fill=n_buildings)) +
   scale_fill_viridis_c(alpha = 0.8,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B")

Final output:

Let’s map this layer over the initial map of individual buildings.

ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Exercise (10 min)

Challenge: Conservation rules have changed. The historical threshold now applies to all pre-war buildings, but the distance to these building is reduced to 10m. Can you map the number of all buildings per 10m fused buffer?


Solution

 old <- 1939 
 distance <- 10
 #select
 old_buildings <- buildings %>%
   filter(start_date <= old)
 #buffer
 buffer_old_buildings <- st_buffer(old_buildings, dist = distance)
  #union
 single_old_buffer <- st_union(buffer_old_buildings) %>%
   st_cast(to = "POLYGON") %>%
   st_as_sf()  
 
 single_old_buffer <- single_old_buffer %>%
   mutate("ID"=1:nrow(single_old_buffer))  %>%
   st_transform(single_old_buffer,crs=4326) 
 #centroids
 centroids_old <- st_centroid(old_buildings) %>%
   st_transform(.,crs=4326)  
  #intersection
  centroids_buffers <- st_intersection(centroids_old,single_old_buffer) %>%
   mutate(n=1)
 centroid_by_buffer <- centroids_buffers %>% 
   group_by(ID) %>%
   summarise(
   n = sum(n)
   )
 single_buffer <- single_old_buffer %>% 
   mutate(n_buildings = centroid_by_buffer$n)
  ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=n_buildings), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Problem: there are many pre-war buildings and the buffers are large so the number of old buildings is not very meaningful. Let’s compute the density of old buildings per buffer zone.

Area

single_buffer$area <- st_area(single_buffer, )  %>% units::set_units(., km^2)

single_buffer$old_buildings_per_km2 <- as.numeric(single_buffer$n_buildings / single_buffer$area)

 ggplot() + 
   geom_sf(data = buildings) +
   geom_sf(data = single_buffer, aes(fill=old_buildings_per_km2), colour = NA) +
   scale_fill_viridis_c(alpha = 0.6,
                        begin = 0.6,
                        end = 1,
                        direction = -1,
                        option = "B") 

Summary and keypoints

We have seen how OpenStreetMap (OSM) geodata works and how to import, select, and visualise OSM vector data. We have seen how to create spatial buffers and centroids, how to intersect vector data and how retrieve the area of polygons.

In short: